Class 4 – 11/12/2002

Subroutines (Functions)

Introduction

http://www.perldoc.com/perl5.8.0/pod/perlsub.html


A subroutine may be declared as follows:

sub NAME { ... }


and called as:

NAME(arg1, arg2, ...); ## Arguments are optional!

## or

&NAME(arg1, arg2, ...);


Any arguments passed to the routine come in as array @_. So to get “arg1” from the above example you would do something like this:

sub NAME {

my $arg1 = $_[0];

## Or you could do this instead (better)

(my $arg1, my $arg2) = @_;

}



The return value of the subroutine is the value of the last expression evaluated, and can be either an array value or a scalar value. Alternately (preferably), a return statement may be used to specify the returned value and exit the subroutine.


sub NAME {

my $arg1, my $arg2) = @_;

my $answer = $arg1 * $arg2;

return($answer);

}


You can define functions wherever you like: the Perl compiler will find them during the compilation phase, and make them available to your code by the time the body of the program is executed. You don't have to worry about defining functions before calling them.


Example functions

## Function add, adds two numbers and returns the new number

sub add {

(my $number1, my $number2) = @_;

my $result = $number1 + $number2;

return($result);

}


Here is a real subroutine taken from one of my programs:


###############################################################################################

## FUNCTION:

## openLogFile ( $filename )

##

##

## DESCRIPTION:

## Opens the file $filename and attaches it to the filehandle "LOGFILE". Returns 0

## on success and non-zero on failure. Any generated error message will get set in

## global variable $!.

##

##

## Example:

## openFile ("/var/log/scanAlert.log");

##

###############################################################################################

sub openLogFile {

## Get the incoming filename

my $filename = $_[0];

## Make sure our file exists, and if the file doesn't exist then create it

if ( ! -f $filename ) {

printmsg("NOTICE: The file [$filename] does not exist. Creating it now with mode [0600].", 0);

open (LOGFILE, ">>$filename");

close LOGFILE;

chmod (0600, $filename);

}

## Now open the file and attach it to a filehandle

open (LOGFILE,">>$filename") or return (1);

## Put the file into non-buffering mode

select LOGFILE;

$| = 1;

select STDOUT;

## Tell the rest of the program that we can log now

$conf{'logging'} = "yes";

## Return success

return(0);

}


Notice the comments at the top of the function - this is extremely important!


The function above can get called like this:

## Open the LogFile

if (openLogFile($conf{'logFile'})) {

## If it failed print an error message

printmsg("ERROR - Opening the file [$conf{'logFile'}] for appending returned the error: $!", 1);

}

## Otherwise print the incoming message to the logfile

else {

print LOGFILE localtime() . " - $$ - $conf{'programName'} - $incoming{'message'}\n";

close LOGFILE;

}


Here is another function taken from my sendSNPP.pl program (available at http://caspian.dotconf.net/). Notice that it has a foreach loop, several if statements, and it's reading from and sending data to a network server (which is already connected to the SERVER filehandle before calling this function.) This function also makes use of regular expressions which we will cover below.

###############################################################################################

## Function: SNPPchat (string $receiver, string $message )

##

## Description: Communicates with SNPP server and sends message to

## specified pager

##

## Input: $receiver Pager number to send message to. This may be more than one

## pager number separated by whitespace. Not all servers

## support this.

## $message Message that will be sent to the pager

##

##

## Output: Returns zero on success, or non-zero on error.

## Error messages will be stored in global $conf{'error'}

##

##

## Example: SNPPchat(“2223333”, “Hi Bob, call me.”);

###############################################################################################

sub SNPPchat {

my %incoming = ();

( $incoming{'receiver'},

$incoming{'message'},

) = @_;

## Make sure the system greeted us correctly

my $status = <SERVER>;

## If the result isn't correct, return an error code and message

if ($status !~ /220/i) {

$status =~ s/$CRLF//;

$conf{'error'} = "The system didn't greet us properly! The server returned [$status]";

return(1);

}

## Input pager number into system and check if pager is known to system

foreach my $receiver (split(/\s+/, $incoming{'receiver'})) {

print SERVER "PAGER $receiver$CRLF";

$status = <SERVER>;

## If the result isn't correct, return an error code and message

if ($status !~ /250/i) {

$status =~ s/$CRLF//;

$conf{'error'} = "Pager ID seems uknown to the system. The server returned $status";

return(2);

}

}

## Enter message to be sent

print SERVER "MESS $incoming{'message'}$CRLF";

$status = <SERVER>;

## If the result isn't correct, return an error code and message

if ($status !~ /250/i) {

$status =~ s/$CRLF//;

$conf{'error'} = "Error entering the message to send. The server returned [$status]";

return(3);

}

## Send the actual message

print SERVER "SEND$CRLF";

$status = <SERVER>;

## If the result isn't correct, return an error code and message

if ($status !~ /250/i) {

$status =~ s/$CRLF//;

$conf{'error'} = "Error submitting the message. The server returned [$status]";

return(4);

}

## Quit to terminate the connection

print SERVER "QUIT$CRLF";

$status = <SERVER>;

## If the result isn't correct, return an error code and message

if ($status !~ /221/i) {

$status =~ s/$CRLF//;

$conf{'error'} = "Error sending the QUIT command. The server returned [$status]";

return(5);

}

## Return Success

return(0);

}



In my script this function gets called like this:

## Send the page

if (SNPPchat($conf{'receiver'}, $conf{'message'})) {

quit($conf{'error'}, 1);

}


But I could also call it like this:

## Send the page

my $status = SNPPchat($conf{'receiver'}, $conf{'message'});

if ($status != 0) {

print “ERROR – The server returned the error code: $status\n”;

}

else {

print “SUCCESS – The server accepted our page\n”;

}






Built-in Functions

http://www.perldoc.com/perl5.6/pod/perlfunc.html

http://caspian.dotconf.net/menu/Links/Useful_Things/Tutorials/perl-all.html#pl-exp-arith.html











Regular Expressions

Introduction

http://www.perldoc.com/perl5.8.0/pod/perlretut.html


A regular expression (regex) describes a pattern of text (rather than merely a literal substring of text) for matching, extracting, or replacing with something else. We create such patterns using the regex language features, consisting largely of literal characters (alphanumeric and a few others) that stand for themselves, and several special characters or character sequences representing particular meanings within a regex pattern.


The ordinary pattern match operator looks like /patern/. It matches against the $_ variable by default and returns true (1) if it matches and a false value (“”) if it doesn't.


if (/apple/) { ... } ## If $_ contains the phrase “apple” in it


The substitution operator looks like s/pattern/replacement/. This operator searches $_ by default. If it finds the specified pattern, it is then replaced with the string in the replacement. If pattern is not matched nothing happens and a false value is returned.


if (s/apple/pear/) { ... } ## If $_ contains the phrase “apple” in it replace it with pear


You may specify a variable other than $_ with the =~ binding operator (or the negated version of it, !~ which returns true if the pattern is not matched). For example:


if ($text =~ /quake/) { ... } ## If $text contains the phrase “quake”

if ($text !~ /quake/) { ... } ## If $text does not contain the phrase “quake”


There are a few basic things that you need to know to get started.


1) Concatenation is an implicit assumption simply meaning we can create larger, more complex patterns by combining simpler patterns. For example, /f/ is a pattern that matches the character 'f', while /o/ is a pattern that matches the character 'o'. If we can combine these into /foo/, then we can match the character sequence 'foo'.


2) Alternation: The '|' character is a meta-character inside a regular expression. It acts as an operator allowing us to specify two or more alternative sub patterns. For example, the pattern m/ab|cd/ will match either 'ab' or 'cd'.


3) Grouping: Parentheses supply a way to create subexpressions treated as a unit. If, for example, we want to match zero-or- more occurrences of the substring 'foo', then we could specify our pattern as: m/(foo)|(bar)/. Placing * outside of the parentheses applies it to the whole parenthesized subexpression. Parentheses also govern the scope of alternation: the pattern m/ab|cd/ means match either 'ab' or 'cd', but the pattern m/a(b|c)d/ means match an 'a', then either a 'b' or a 'c', and finally a 'd'.


4) Wildcard: The dot . is the wildcard character. It matches any character other than a newline character (this can be changed to include the newline as well). Thus, the pattern: m/f.*bar/ will match an 'f' followed by zero-or-more of any characters, followed by 'bar'.


5) Every regular expression can have one or more modifiers that affect the behavior of the regular expression:

Modifier Meaning

g Match globally, i.e., find all occurrences

i Do case insensitive pattern matching

m Treat string as multiple lines

o Only compile pattern once

s Treat sting as a single line

x Use extended regular expressions


Examples:

$x = "There once was a girl who programmed in Perl\n";


$x =~ /once was a girl/; ## Does match

$x =~ /once was a Girl/; ## Doesn't match

$x =~ /once was a Girl/i; ## Does match



Quantifiers

The quantifier meta characters ?, *, +, and {} allow us to determine the number of repeats of a portion of a regexp we consider to be a match. Quantifiers are put immediately after the character, character class, or grouping that we want to specify. They have the following meanings:

? match the previous element 1 or 0 times

* match the previous element 0 or more times, i.e., any number of times

+ match the previous element 1 or more times, i.e., at least once

{n,m} match the previous element at least n times, but not more than m times.

{n,} match the previous element at least n or more times

{n} match the previous element exactly n times



Character classes

The [ ... ] construct is used to list a set of characters (a character class) of which one will match. Brackets are often used when capitalization is uncertain in a match:

/[tT]here/ matches either “there” or “There”


A dash “-” can be used to indicate a range or characters in a character class.

/[a-zA-Z]/ matches any upper-case or lower-case letter

/[0-9]/ matches any single digit


To put a literal dash in the list you must prefix it with a backslash, (i.e. \-). By placing a caret (^) as the first element in a character class you negate it.

/[^A-Z]/ matches anything but an upper-case letter



There are several predefined character classes:

\d a digit, same as [0-9]

\D a nondigit, same as [^0-9]

\w a word character, same as [a-zA-Z_0-9]

\W a nonword character, [^a-zA-Z_0-9]

\s a whitespace character, same as [ \t\n\r\f]

\S a non-whitespace character, [^ \t\n\r\f]

The period '.' any character but "\n" – this is the regular expression wild card.



Anchors

Anchors don't match any characters; they match places within a string. The two most common anchors are ^ and $, which match the beginning and end of a line, respectively. Here are the anchors that can be used in regular expressions.

^ matches at the beginning of the string (or line if /m was used)

$ matches at the end of a string (or line if /m was used)

\b matches at word boundary (between \w and \W)

\B matches except at word boundary

\A matches at the beginning of the string

\Z matches at the end of the sting or newline

\z matches only at the end of the string

\G matches where previous m//g left off



Special Variables

Parentheses not only serve to group elements in a regular expression, they also remember the patterns they match. Every match from a parenthesized element is saved to a special, read-only variable indicated by a number. You can recall and reuse a match by using these variables.


Within a regular expression, each parenthesized element saves its match to a numbered variable, in order starting with 1. You can recall these matches within the expression by using \1, \2, \3, and so on.


Outside of the regular expression, the matched variables are recalled with the usual dollar-sign i.e. $1, $2, $3, and so on.

$text = “Homer said, mmmm..”;

$text =~ s/(Homer)(m)+/$2 is what $1 said/; # Results in “mmmm is what Homer said..”


Other back-referencing variables:

$+ the last parenthesized pattern match

$& the entire matched strings

$` everything before the matched strings

$' everything after the matched string


So for example:

$text = “Homer said, mmmm..”;

if ($text =~ /Homer said, /) {

print “$'\n”; ## Prints “mmmm..”

}



Homework

Homework will be emailed out – this time homework will be more pre-defined. I will send out some sample files that you can build perl scripts against.